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REMARKS 



Claims 1-16 and 35-50 are pending. Claims 35-50 are new. Claims 17-34 have been 
canceled in response to the Office Action's request for clarification of the numbering. Claims 
19-34 presented in the Response filed March 18, 2005 now correspond to new claims 35-50, 
respectively. Likewise, claims 17-32 presented in the Response filed November 7, 2005 also 
correspond to new claims 35-50, respectively. 

Claims 1 and 2 are herein amended. Claims 3-9, 1 1 and 16 were previously amended. 

Claims 35 and 36, while new, are modified from now canceled claims 17 and 18 as filed 
in the Response of November 7, 2005 in the same manner as claims 1 and 2 have been amended. 
Support for these amendments and modifications is fully described below in response to the 
Office's request to amend respective terms in each of the foregoing claims. 

New claims 37-50 are copied directly from claims 19-32 as presented in the Response 
filed November 7, 2005. 

For the Examiner's convenience, responses herein have been numbered to correspond to 
the appropriate objection or rejection in the Office Action. 

It should be noted that the correct Attorney Docket No. for this application is 
"77670/593." 

New Matter Objection to Amendment to Specification 

[6] The Office Action objects to amendments to the specification filed on March 18, 
2005. The Office asserts that Applicants have failed to point to express support for the recited 
limitafion DiD2Xi(X2X3)X4D3. The Office suggests Applicants have argued that support for 
DiD2Xi(X2X3)X4D3 is only implicitly supported by specific examples in the specification and 
these specifically disclosed species fail to describe the genus of sequences having the formula 
D]D2Xi(X2X3)X4D3. The Office further urges that Applicants do not dispute 
DiD2Xi(X2X3)X4D3 alters the positioning of the parentheses such that it encompasses amino acid 
sequence permutations that are not supported by the original disclosure. 

Applicants respectfully traverse the Office's assertions and conclusions. First, the 
expressly disclosed DiD2XiX2pC3X4)D3 is mathematically equivalent to DiD2Xi(X2X3)X4D3 and 
the genus of sequences encompassed by these two algorithms is exactly the same genus. Second, 
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in the response of November 7, 2005, Applicants provided a mathematical proof that 
DiD2XiX2(X3X4)D3 encompasses the same sequences as DiD2Xi(X2X3)X4D3. As such, 
Applicants again traverse the assertion that DiD2Xi(X2X3)X4D3 creates sequence permutations 
that are not supported by the original disclosure. 

Contrary to the Office's understanding, support for DiD2Xi(X2X3)X4D3 is not implicit in 
DiD2XiX2(X3X4)D3, instead, support is explicit because the algorithm DiD2XiX2(X3X4)D3 is 
mathematically exactly equivalent to DiD2Xi(X2X3)X4D3 and, as such, these two algorithms 
encompass the exact same genus of sequences. Applicants provide the following chart to 
demonstrate the mathematical equivalency of D|D2Xi(X2X3)X4D3and DiD2XiX2(X3X4)D3 and 
the fact that these algorithms encompass the exact same genus of sequences. 
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DiD2XiX2{X3jCilD3 
Possible outcomes where 

X is any amino acid and 

X3 and/or X4 are present or absent 

L D1D2X1X2D3 

X3 and X4 are absent 



IL D1D2X1X2X3D3 
X4 is absent 



III. 0102X1X2X403 
X3 is absent 



IV, O1O2X1X2X3X4D 

X3 and X4 are present 



DiDiX,(XiX.)XdD. 

Possible outcomes where 

X is any amino acid and 

X2 and/orX3 are present or absent 

L 0,02X1X403 

X2 and X3 are absent 

Since X4 is any amino acid, the range of 
possible amino acids for X4 is the same as 
the range of possible amino acids for X2 in 
the left hand column: 

^^(amino acid 1-20) ^^(amino acid 1-20) 

O1O2X1X4O3, therefore, represents the same 
range of sequence possibilities as 
O1O2X1X2O3 (Section I) in the left column. 

IL 0,02X1X2X403 
X3 is absent 

Since X4 is any amino acid, the range of 
possible amino acids for X4 is the same as 
the range of possible amino acids for X3 in 
the left hand column: 

^^(amino acid 1-20) _ (amino acid 1-20) 

0, 02X1X2X403, therefore, represents the 
same range of sequence possibilities as 
O1O2X1X2X3O3 (Section II) in the left 
column. 

Ill, O1O2X1X3X4O3 
X2 is absent 

Since X3 is any amino acid, the range of 
possible amino acids for X3 is the same as 
the range of possible amino acids for X2 in 
the left hand column: 

^^(amino acid 1-20) -j^^(amino acid 1-20) 

O1O2X1X3X4O3 therefore, represents the 
same range of sequence possibilities as 
O1O2X1X2 X4D3 (Section III) in the left 
column. 



IV, O1O2X1X2X3X4O3 

X2 and X3 are present 

0102X1X2X3X403 are equivalent to 
O1O2X1X2X3X4D3 in (IV) in the left column 
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The foregoing table demonstrates that, contrary to the Office's conclusion, the position of 
the parenthesis in the algorithm creates no change in the genus of amino acid sequences 
encompassed therein. Hence, Applicants submit that the genus of the sequence algorithm 
DiD2Xi(X2X3)X4D3 is fully supported in the specification. 

Furthermore, as extensively set forth in the response filed November 7, 2005, the 
application is replete with sequences that reflect the algorithm DiD2Xi(X2X3)X4D3. See, e.g., 
Figure 1 and column 4, lines 60-67; column 6, lines 48-63. Hence, the sequence 
DiD2Xi(X2X3)X4D3 in region II of the prenyl diphosphate synthases is supported by the original 
disclosure at, for example, column 4, line 24. Withdrawal of the objection is therefore 
respectfully requested. 

Objection to format of amended claims 

[7] The Office Action objects to claims 17-32 for improper format. In response to the 
Office's objection, claims 17-34 have been canceled. Canceled claims 19-34 as presented in the 
Response filed March 18, 2005 now correspond to new claims 35-50, respectively. Likewise, 
canceled claims 17-32, as erroneously presented in the Response filed November 7, 2005, also 
correspond to new claims 35-50, respectively. 

Claims 35 and 36, while new, are modified from now-canceled claims 17 and 18 as filed 
in the Response of November 7, 2005. The modifications contained in claims 35 and 36 
correspond to the amendments to claims 1 and 2. Support for these amendments and 
modifications is fully described below in response to the Office's request to amend respective 
terms in each of the foregoing claims. 

New claims 37-50 are copied directly fi*om claims 19-32 as presented in the Response 
filed November 7, 2005. 

Because new claims 35-50 comply with 37 C.F.R. § 1 .173(b)(2), Applicants respectfiilly 
request the Office withdraw its objection for the format of the claims. 

Office request and support for amendment to claims 1 and 17 to clarify synthesis of shorter 
chain lengths 

[8] The Office Action requests that claims 1 and 17 be amended to clarify how prenyl 
diphosphate synthesized by a mutant prenyl diphosphate synthase may be shorter than that 
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synthesized by a corresponding wild-type enzyme. Claim 1 and claim 35 (corresponding to 
claim 17) have been amended/modified to make such a clarification. The amendment to claim 1 
and modification of new claim 35 clarify that the mutant enzyme synthesizes a greater amount of 
prenyl diphosphate of a shorter length than does a wild-type enzyme. This amendment finds 
support, for example, in Figure 3; the illustration of Figure 3 at column 5, lines 15 to 28; and 
throughout the specification, including column 6, lines 17 to 21. As such, Applicants 
respectfully request the Office withdraw the rejection of claim 1 for clarity of synthesis of 
shorter chain lengths and enter new claim 35 for the same reason. 

Office request and support for amendment to claims 1 and 17 to clarify "Region II" 

[9] The Office has requested in the Office Actions of July 1 8, 2005 and January 3, 
2006 that the claims be amended to clarify the position of the acid-rich domain of Region II by 
identifying distinguishing characteristics of Region II. In response to the Office's request, the 
distinguishing characteristics of Region II are presently clarified in amended claim 1 and new 
claim 35 by establishing that the acid-rich domain sequence D^D^X^(X^X^)X^D^ or 

D^D^XpL^(Xpi^)D^ (which are nevertheless identical) must be contained within region II and the 
sequence of region II must share at least 25% homology with region II of SEQ ID N0:2. SEQ 
ID N0:2 is a geranylgeranyl diphosphate synthase of Arabidopsis thaliana that is disclosed at 
column 4, line 60 through 64 and in Figure 1 . 

Support for the amendment to claim 1 and for new claim 35, which clarify the term 
"Region II," may be found, for example, in claim 1, as originally filed; Figure 1 and the 
illustration of Figure 1 at colunm 4, line 60 through column 5, line 7; Example 1, in column 10; 
Example 4, in column 12; Chen et al. Protein Science Vol. 3, pp. 600-607 (1994) (cited and 
discussed in application at column 5, lines 31-63) (a copy of which is provided herewith); and in 
the art as discussed by K. Kelly, "Exhaustive and Iterative Clustering of the Protein Databank" 
JCCG Winter 1998, http://www.chemcomp.com/joumal/familes.htm (a copy of which is 
provided herewith). 

Claim 1, as originally filed, recited the aspartic acid-rich domain of Region II of prenyl 
diphosphate synthase. As such, the application as filed supports this particular characterization 
of Region II. 
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Support for the percent homology limitation of the claims, as amended, is found in all of 
the prenyl diphosphate synthase sequences disclosed in the application. It is further supported by 
knowledge in the art at the time of filing. One of skill in the art at the time of filing fully 
understood that Region II of prenyl diphosphate synthases could be determined by locating the 
aspartic acid rich region D^D^X^(Xpi^)X^D^ and implementing pairwise comparisons to 

determine the homology of Region II as compared to a known prenyl diphosphate synthase 
region II. The skilled artisan's understanding is fully disclosed in Chen et al. Protein Science 
Vol. 3, pp. 600-607 (1994), the findings of which are discussed in the application at column 5, 
lines 31 through 63. 

Chen demonstrates that the art had fully identified homologous Region II in all prenyl 
diphosphate synthases sequenced (13 total synthase sequences). Chen further demonstrates that 
Region II of each of the thirteen sequences shares at least 25% homology with Region II of SEQ 
ID N0:2. See, e,g., page 602, Figure 2, Region II (compare GGPP_CAN to all other sequences). 
Furthermore, in the application as filed, all sequences denoted as Region II of prenyl diphosphate 
synthases share at least 25% homology with positions 72 to 93 (Region II) of SEQ ID N0:2. 
See, e.g., sequences of Figure 1 ; mutant nucleic acid sequences of SacGGPS in Example 4 (col. 
12, lines 16-43); and wild-type sequence of SacGGPS (GenBank accession no. D28748) in 
Example 1 (col. 10, lines 20-26). 

Further support for the amendment may be found by reference to what was well known in 
the art at the time the application was filed. Applicants have attached hereto K. Kelly, 
"Exhaustive and Iterative Clustering of the Protein Databank" JCCG Winter 1998. Kelly 
references the state of the art in early 1998 when acknowledging: 

To date, the most useful principle which is applied to guide such searches is the 
rule that 'similar sequence' implies 'similar structure' .... Generally speaking, if 
a protein sequence shares more than 25% pairwise similarity vAth a known 
structure, it usually also shares at least the broad outlines of the fold topology of 
the known structure. 

See id at 1 (emphasis added). In view of the teachings of Kelly and Chen, one of skill in the art 
would expect a sequence fi-om a prenyl diphosphate synthase comprising D^D^X^(Xpi^)X^D^ 

and having at least 25% homology with Region II of SEQ ID N0:2 to possess the topological 
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and functional characteristics of Region II as taught by Chen. See Chen at 605, Fig. 6 (disclosing 
Region II of prenyl diphosphate synthases as juncture of a-helix, a2, and loop, L3). 

In view of the homology of the sequences disclosed in the application, including the 
sequences disclosed in Chen, and the understanding in the art of topological and functional 
characteristics of Region II, as taught in Chen and Kelly, Applicants respectfully submit the 
characterization of Region II of prenyl diphosphate synthases in amended claim 1 and new claim 
35, as urged by the Examiner, is fully supported by the application as filed. As such. Applicants 
respectfully request the Office withdraw the rejection of claim 1 for clarity of Region II of prenyl 
diphosphate synthase and likewise enter new claim 35 containing the same clarification. 

Office request and support for amendment to claims 2, 16 and 18 to clarify "enzymatic 
activity" 

[10] The Office Actions requests amendment to claims 2, 16 and 18 (new claim 36) to 
clarify the term "enzymatic activity." Claim 2 and new claim 36 have been amended/modified to 
replace the term "enzymatic activity" with "synthesizes about as much or more prenyl 
diphosphate than the amount of prenyl diphosphate synthesized by the wild type prenyl 
diphosphate synthase under similar conditions." Support for this amendment may be found, for 
example, in Figure 2; the illustration of Figure 2 at column 5, lines 8 to 15; and in Example 5 at 
column 12, line 45 through column 13, line 14. This disclosure shows that the enzymatic 
activity of the synthases was determined by counting total incorporation of radioactive 
isopentenyl diphosphate into prenyl diphosphates generally. As such, enzymatic activity reflects 
a measure of the total synthesis of the broad class of prenyl diphosphates by prenyl diphosphate 
synthases. Claim 2 and new claim 36, as amended, reflect this clarification of the term 
"enzymatic activity." As such, Applicants respectfully request the Office withdraw the rejection 
of claim 2 for clarity of "enzymatic action" and likewise enter new claim 36. Likewise the 
rejection of claim 16, which depends from claim 2, should be obviated. 

Rejection of claims 17-32 under 35 U.S.C< § 112, first paragraph 

[11] The Office Action rejects claims 17 to 32 (new claims 35-50) for lack of written 
description because the Office believes the genus of sequences defined by D^^^Q^^^X^^ is 

different from the genus of species defined by DiD2XiX2(X3X4)D3. However, as Applicants 
have explained above, T>p^^(X^^Xp^ and DiD2XiX2(X3X4)D3 encompass the very same 
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genus. Because the genus of DjD2Xj(X2X3)X^D3 is the same as the genus of 

DiD2XiX2(X3X4)D3, Applicants respectfully submit the presence ofDppi^(Xpi^)Xp^ in 

claims 35 to 50 is not new matter and respectfully request withdrawal of the rejection of claims 
35 to 50 for lack of support. 

Rejection of claims 7, 16 and 23 under 35 U.S.C. § 112, first paragraph 

[12] The Office Action rejects claims 7 and 23 (new claim 41) for lack of written 
description and invites Applicants to show support for the limitation of themostability greater 
than wild-type. Support for claim 7 and new claim 41 may be found, for example, in Figure 2 
and the illustration of Figure 2 at column 5, lines 7 to 14, where Applicants disclose mutant 
enzymes having greater relative enzymatic activity than wild-type enzyme as the temperature of 
the environment is increased. The maintenance of enzymatic activity of the mutants relative to 
two wild-type enzymes as the environment temperature is increased demonstrates the thermal 
stability of the mutant enzymes. Because the application as filed supports claim 7 and new claim 
41, Applicants respectfully request the Office withdraw the rejection of these claims. 

Rejection of claims 1-7, 10, 15-23, 26 and 31-32 under 35 U.S.C. § 112, first paragraph 

[13] The Office Action rejects claims 1-7, 10, 15-23, 26 and 31-32 under 35 U.S.C. 
§ 1 12, first paragraph, for failure to provide sufficient structure to demonstrate possession of the 
claimed subject matter. The Office rejects Applicants' arguments in support of written 
description of the claims because the Office believes that outside of the claims' minimal recited 
structural features of region II, the remainder of the structures of the members encompassed by 
the genus of the claims are completely undefined. The Office concludes that the minimal 
sequence structure recited in the claims would not impart famesyl diphosphate synthase activity. 
According to the Office Action, the recited enzymatic activity may only be achieved by adding 
amino acids back to the terminal ends of the minimal recited structural feature to reconstruct a 
properly folded enzymatically active synthase enzyme. Applicants respectfully traverse the 
Office's conclusions and assertions in turn. 

The Office asserts the pending claims are completely undefined outside of the minimal 
structure recited in the claims. Applicants strenuously disagree with and respectfully traverse 
this assertion. The claims under examination are highly structured and the scope of the claims is 
strictly delimited, well beyond the minimal sequence limitations provided in the claims. 
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First, the claims are directed to an enzyme, not the polypeptide sequence of an enzyme. 
As such, the enzyme must function as a prenyl diphosphate synthase. Applicants have provided 
at least 16 structural examples of functional prenyl diphosphate synthases. Furthermore, 
Applicants site to art from 1994 that provides at least 13 structural examples of functional prenyl 
diphosphate synthase from a wide array of taxonomic phyla and provides at least five other 
articles between 1993 and 1995 disclosing additional examples. Applicants respectfully submit 
that both the structure and function of prenyl diphosphate synthase was well defined in the art 
and fully disclosed in the application. One of skill in the art would understand that applicants 
possessed all of these enzymes and possessed the means to manipulate these enzymes. As such, 
possession by Applicants of the structural and functional prenyl diphosphate limitation of the 
claims should not be in doubt. As a result, Applicants traverse the assertion that amino acids 
must be added back to the structural limitations of the claims to achieve a functioning synthase 
enzyme. In fact, the claims begin, in their first limitation, with a fully-structured, fimctioning 
synthase enzyme which is only then subject to modification. Nothing is added back. Applicants 
respectfully request the Office withdraw this line of reasoning in support of the rejection of the 
claims. 

Next, the claims are directed to a mutant enzyme defined by the substitution of certain 
amino acids at particular positions in region II of the wild-type enzyme. Applicants have 
provided five examples of mutant enzymes with substitutions at the prescribed positions in the 
sequence of the wild-type enzyme. 

Furthermore, Applicants have provided actual working examples of four of five possible 
substitutions in accordance with element 1(a) of claims 1 and 35, namely, substitution at the fifth 
position upstream of Dj (e.g., mutant enzyme 3; column 6, line 53), substitution at the fourth 
position upstream of Dj {e.g., mutant enzyme 1; column 6, line 48), substitution at the second 
position upstream of Dj (e,g, mutant enzyme 5; column 6, line 59) and substitution at the first 
position upstream of Dj (e.g., mutant enzyme 4; column 6, line 56). 

Applicants have likewise provided an actual working example of the only substitution 
possible in accordance with element 1(b) of claim 1, namely, substitution at the first position 
upstream of D3, and the only substitution possible in accordance with element 1(b) of claim 35, 
namely, substitution at the first position downstream of D^, (e.g., mutant enzyme 5, column 6, 
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line 59). Applicants have also provided an actual working example of both insertions possible in 
accordance with element 2 of claims 1 and 35, namely, insertion of at least one additional amino 
acid between D3 and the first amino acid upstream of D3 (claim 1) and insertion of an amino acid 
between the first amino acid dovrastream of and the first amino acid upstream of (claim 
35) (e.g. , mutant enzyme 5, column 6, line 59). 

In view of the working examples disclosed in the application. Applicants have reduced to 
practice all iterative positions encompassed within the structure provided in the genus of claims 1 
and 35 except one, namely, substitution at position 2 upstream of Dj. As discussed more fully 
below, one of skill in the art (upon review of the prenyl diphosphate enzymes disclosed in the 
application and known in the art) would recognize Applicants also possessed substitution of 
residues at position 2 upstream of Dj. 

The Office asserts that one of skill in the art would recognize that the "minimal" 
sequence information in the claims would not impart the recited enzymatic activity. Of course, 
Applicants do not dispute this. Applicants respectfully submit, however, that the "minimal" 
sequence information in the claims is not directed to enzymatic activity of the prenyl diphosphate 
synthase. Instead, the sequence information is directed to modification of the enzymatic activity 
of a wild-type prenyl diphosphate synthase to impart synthesis of a relatively greater amount of 
short chain prenyl diphosphate. As such, one of skill in the art would recognize that the wild- 
type enzyme (fully structured and functional in nature) is enzymatically active and Applicants 
have provided the artisan with a mechanism for reducing the chain length of the product that 
results from said already established enzymatic activity. 

Applicants respectfully dispute the Office's conclusion that the claimed elements do not 
impart enzymatic activity. In fact, in five of five experiments, Applicants have shown that the 
"minimal" amino acid structure of the claims, in combination with all of the other elements of 
the claims (e,g., starting material of an enzymatically active enzyme), provides enzymatic 
activity as claimed. This conclusion is further supported by the results of Chen, wherein thirteen 
wild-type (enzymatically fully functional) enzymes have variable amino acids at each and all of 
the positions specified in the claims. See Chen at 602, Figure 2 (e.g., FFP_YSC is substituted at 
positions 1, 2, 4 and 5 upstream from Di and position 1 dovrastream from Di of GGPP_CAN). 
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As such, one of skill would specifically recognize the enzymatic functionality of wild-type 
enzymes subject to the strictly circumscribed substitutions set forth in claim 1 and claim 35. 

Finally, the enzymatic functionality of the mutant prenyl diphosphate synthase is a 
limitation of the claims. As such, the scope of the claims is directed only to functional mutant 
enzymes. Since Applicants have provided working examples of functional mutant enzymes with 
substitutions and insertions at all but one possible positional iteration, as set forth in the claims, 
Applicants respectfully submit it is immediately apparent to one of skill in the art that Applicants 
possessed the invention of the claims. 

In view of the highly limited structure and well established function of the claims, 
Applicants respectfully request the Office withdraw the rejection of the claims for having 
"completely undefined structure" outside of the aspartic-acid rich region and for providing the 
artisan with insufficient guidance to generate a functional enzyme. The disclosure of the 
application renders these conclusions manifestly unsubstantiated. 

Rejection of claims 1-32 under 35 U.S.C. § 112, first paragraph, for enablement 

[14] The Office Action rejects claims 1-32 under 35 U.S.C. § 1 12, first paragraph, for 
failure to provide an enabling disclosure. Applicants respectfully traverse the rejection. As set 
forth immediately above, Applicants have provided working examples of nearly all posifional 
iterations possible within the claims and as well as disclosure demonstrating the extensive 
number of prenyl diphosphates knovm in the art, and the broad fimctional variability of the 
conserved sequences, which Applicants have taught as candidates for modification. Such 
disclosure and skill in the art supports a conclusion that the skilled artisan would have 
understood the examples and teachings of the application can be practiced on any functional 
wild-type prenyl diphosphate synthase to produce a mutant enzyme of the claims. 

The Office has asked Applicants to provide a discussion of enablement within the Wands 
factors. The Office insists that the claims are overly broad and encompass a vast number of 
mutant prenyl diphosphate synthases having the minimal structural features of the claims but 
also encompassing synthases from transgenic organisms and other limitless origins. Applicants 
respectfiiUy and strenuously traverse the allegation of over breadth. The claims are not 
overbroad. In fact, they are highly circumscribed. The claims do not encompass a vast number 
of synthases and do not encompass all synthases from transgenic organisms. To the contrary, the 
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claims encompass mutants of a wild-type enzyme where substitutions and insertions have 
occurred at a total of 7 possible positions. Considering that the claimed enzyme must be derived 
from a wild-type, must be a mutant, must comprise mutations in a highly circumscribed 7 
possible positions, and as claimed must produce shorter prenyl diphosphates than the wild-type, 
the claims are simply extremely narrow. 

The Office insists there is a lack of guidance and working examples in the application 
because only five working examples have been provided. Applicants respectfully traverse this 
conclusion. As discussed more fully above, the working examples in the specification happen to 
represent all but one possible positional iteration within the structural limitations of the claims. 
Furthermore, the art was replete v^th examples of functional wild-type enzymes, which the 
skilled artisan would have compared with the data of the application to understand that the 
claimed enzymes may be derived from any of the functional wild-type enzymes known in the art. 
This conclusion would have been easily drawn from the extensive conservation of region II and 
domain I within the synthases of interest. See Chen at 602. As such, the guidance within the 
application is particularly extensive with respect to the universe of possible iterations in the 
claims. Additionally, the art provided a full understanding of Avild-type prenyl diphosphate 
synthases at the time of filing. In combination, Applicants respectfully request the Office 
acknowledge that the application as filed contained more than sufficient guidance and more than 
extensive working examples so long as each limitation of the claims is fully considered in the 
enablement analysis. 

The Office insists there is a high level of unpredictability in the art and cites Branden et 
al ("Introduction to Protein Structure," Garland Publishing Inc., New York, 1991) for the 
proposition that protein engineers have frequently been surprised by the range of effects caused 
by point mutations that they hoped would change only one specific and simple property in 
enzymes. The Office further notes Branden's assertion that the artisan in 1991 knew little about 
the rules of protein stability and the artisan found it difficult to design de novo stable proteins 
with specific functions. Applicants respectfully but emphatically traverse these findings. 

First, Kelly (1998) (attached herewith) directly contradicts the teaching of Branden 
(1991) and demonstrates amazing strides in the artisan's understanding of protein structure and 
function between 1991 and 1997 when the applicafion was first filed. 
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Next, while a range of effects may be expected by a point mutation in a non-conserved 
sequence, a point mutation in a highly conserved sequence resulting in a sequence found in other 
functional enzymes vs^ould likely not give rise to unexpected effects. Since the pending claims 
are directed to mutations in highly circumscribed portions of a highly conserved sequence for 
which extensive data is available on functional wild-type enzymes, these precisely directed and 
expressly limited mutations would not be expected to provide unpredictable results. Li fact, the 
expectation would be the opposite. Applicants respectfully submit a high level of predictability 
in the art with respect to the practice of the claims. 

Finally, experimentation would not be undue. The application provides working 
examples of six of seven possible positions in which mutations may be made. The artisan need 
only follow the protocol provided in the application to practice the invention on wild-type prenyl 
diphosphate synthases with mutations created at the seven possible positions. 

The Office insists that it is not routine in the art to screen for all polypeptides having a 
substantial number of substitutions or modifications as encompassed in the claims. Applicants 
do not disagree with this assertion. Applicants respectfully submit, however, that screening for a 
substantial number of substitutions or modifications is not at all necessary to practice the 
invention. In fact, the only screening necessary is the very screening exemplified in the 
application — for which five of five screens returned working results. As such, Applicants 
respectfully submit that the easy screening step in the working examples demonstrates in itself 
that experimentation would not be undue. 

Double Patenting Rejections 

[15] As noted by the Office Action, Applicants will attend to the double patenting 
rejections at such time as claims in the present application are allowed. Applicants gratefully 
acknowledge the Office's disabuse of Applicants' understanding that the double patenting 
rejection is provisional. 
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CONCLUSION 



The claims are believed to be in condition for allowance and Applicants respectfully 
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Abstract 

Isoprenyl diphosphate synthases are ubiquitous enzymes that catalyze the basic chain-elongation reaction in the 
isoprene biosynthetic pathway. Pairwise sequence comparisons were made for 6 famesyl diphosphate synthases. 
6 geranylgeranyl diphosphate synthases, and a hexaprenyl diphosphate synthase. Five regions with highly con- 
served residues, two of which contain aspartate-rich DDXX(XX)D motifs found in many prenyltransferases, were 
identified. A consensus secondary structure for the group, consisting mostly of a-helices, was predicted for the 
multiply aligned sequences from amino acid compositions, computer assignments of local structure, and hydrop- 
athy indices. Progressive sequence alignments suggest that the 13 isoprenyl diphosphate synthases evolved from 
a common ancestor into 3 distinct clusters. The most distant separation is between yeast hexaprenyl diphosphate 
synthetase and the other enzymes. Except for the chromoplastic geranylgeranyl diphosphate synthase from Cap- 
sicum annuum, the remaining farnesyl and geranylgeranyl diphosphate synthases segregate into prokaryotic/ 
archaebacteriai and eukaryotic families. 

Keywords: catalytic site; evolution; farnesyl diphosphate; geranylgeranyl diphosphate; prenyltransferase; secondary 
structure; substrate binding 



With more than 23,000 known members, isoprenoids constitute 
the most chemically diverse family of naturally occurring com- 
pounds. Some of the more important products of the pathway 
are the sterols (Poulter & Rilling, 1981a), ubiquinones (Ashby 
& Edwards. 1990), dolichols (Matsuoka et al., 1991), carotenoids 
(Spurgeon & Porter. 1981), prenylatcd proteins (Clarke, 1992), 
and plant mono-, sesqui-, and diterpenes (Cane, 1981; Croteau, 
1981; West, 1981). All of these compounds are derived from 
linear isoprenoid diphosphates synthesized from isopentenyl di- 
phosphate and dimethylallyl diphosphate by a family of pren- 
yltransferases that catalyze sequential condensations of IPP with 
allylic isoprenoid diphosphates, as shown in Figure 1. Although 
the chemical mechanisms of these condensation reactions are 
identical, the isoprenyl diphosphate synthases differ in their se- 
lectivity with respect to the chain length and double-bond ste- 



Reprint requests to: C. Dale Poulter, Department of Chemistry, Uni- 
versity of Utah, Salt Lake City, Utah 84112; e-mail: poulter@chemistry. 
utah.edu. 

Abbreviations: DMAPP, dimethylallyl diphosphate; FPP. famesyl 
diphosphate; FPPSase. famesyl diphosphate synthase; GGPP, geranyl- 
geranyl diphosphate; GGPPSase, geranylgeranyl diphosphate syn- 
thase; HexPPSasc, hexaprenyl diphosphate synthase; IPP. isopentenyl 
diphosphate. 



reochemistry of their respective allylic substrates and the chain 
length and stereochemistry of newly formed double bonds in 
their products (Poulter & Rilling, 1978. 1981b). 

During the past few years the structural genes for several far- 
nesyl diphosphate synthases, geranylgeranyl diphosphate syn- 
thases, and a hexaprenyl diphosphate synthase have been 
identified and characterized. Early sequence comparisons re- 
vealed 2 conserved DDXX(XX)D aspartate-rich domains (Ashby 
et al., 1990), which were thought to be binding sites for the di- 
phosphate moieties in IPP and the allylic substrates. This pro- 
posal was supported by kinetic studies of site-directed mutants 
(Marrero et al., 1992). More recently, Koyama et al. (1993) iden- 
tified 7 conserved regions in eubacterial and eukaryotic FPPSases, 
including the 2 aspartate-rich regions. 

Multiple sequence alignments are valuable for identifying con- 
served sequences in proteins. In addition, multiple alignments 
can be used in conjunction with procedures for predicting sec- 
ondary structure from primary sequences to obtain improved 
predictions, as for example, the prediction of the structure of 
the a subunit in tryptophan synthase (Crawford et al.. 1987). 
We now report sequence comparisons for 13 prenyltransferases, 
including 6 FPPSases, 6 GGPPSases, and a HexPPSase that 
suggest divergence from a common ancestor based on a com- 
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Secondary structure of isoprenoid diphosphates 




Fig. 1. Synthesis of linear isoprenoid diphosphates from IPP and 
DMAPP by isoprenyl diphosphate synthases. 



bination of function and organisms. We also propose a common 
a-helical secondary structure for the 13 enzymes. 

Results 

Pairwise comparisons 

Amino acid sequences for 6 FPPSases. 6 GGPPSases, and a 
HexPPSase were compared pairwise by the Needleman and 
Wunsch method using the TREE program of Feng and Doolit- 
tle (1987. 1990), and the results are shown in Table 1 . There was 
45-S4% amino acid identity among the eukaryotic FPPSases 
from humans (Sheares et al., 1989), rats (Clarke et al., 1987). 
chickens (Kroon, unpubl. results), and yeast (Anderson et al., 
1989). Substantially lower identities of 10-25<7o were seen when 
the eukaryotic FPPSases were compared as a group with the 
other 9 prenyltransferases, including the eubacterial FPPSases 
from Escherichia coii (Fujisaki et al., 1990) and Bacillus stear- 
othermophilus (Koyama et al., 1993). However, the 2 eubacterial 
FPPSases showed substantial identities of 27-44% with the 
chromoplast (chloroplast-related) GGPPSase from Capsicum 
annuum (Kuntz et al.. 1992), the bifunctional archaebacterial 
FPP/GGPPSase from Methanobacterium thermoautotrophi- 
cum (Chen & Poulter. unpubl. results), and the eubacterial 
GGPPSases from Erwinia herbicola (Armstrong et al.. 1990; 
Math et al., 1992), Erwinia uredovora (Misawa et al., 1990), and 
Rhodobacter capsulatus (Armstrong et aL, 1989). The fungal 
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GGPPSase from Neurospora crassa (Carattoli et al., 1991) and 
yeast HexPPSase (Ashby & Edwards, 1990) had lower sequence 
identities of \Q-15% with the other prenyltransferases. 

Multiple sequence alignments, conserved sequences, and a 
phylogenetic tree for isoprenyl diphosphate synthase 

The amino acid sequences for 13 isoprenyl diphosphates were 
aligned according to the procedures of Feng and Doolittle 
(1990), as shown in Figure 2. Five regions, designated I-V, were 
found where highly conserved residues appeared in at least 12 
of the 13 sequences. Similar regions were identified by Koyama 
et al. (1993) for a more limited set of isoprenyl diphosphate syn- 
thases. Regions II and V are rich in negatively charged aspar- 
tates and positively charged arginines or lysines. These sequences 
correspond to those originally labeled as domains I and II, re- 
spectively, by Ashby et al. (1990), who proposed they were di- 
phosphate binding motifs. 

The relatively high pairwise percentage identities among se- 
lected pairs of prenyltransferases listed in Table I are consistent 
with all 13 isoprenyl diphosphate synthases having diverged 
from a common ancestral enzyme. This hypothesis is further 
supported by the existence of 5 highly conserved regions, 2 of 
which are of considerable length. However, it was apparent from 
inspection of the alignments that there is a high degree of diver- 
gence between the eukaryotic and eubacterial FPPSases and be- 
tween the FPPSase and HexPPSase in yeast. 

A phylogenetic tree (see Fig. 3) was constructed for the 13 iso- 
prenyl diphosphate synthases using the progressive multiple 
alignments (Feng & Doolittle, 1987. 1990) shown in Figure 2. 
Three major groupings were obtained. The most primitive 
branch was a functional segregation of the higher chain length 
yeast HexPPSase from the shorter chain farnesyl and geranyl- 
geranyl synthases. The shorter chain length enzymes further seg- 
regated into bacterial (eubacteria and archaebacteria) and 
eukaryotic clusters. The single exception to this pattern was the 



Table 1. Pairwise percent identity of isoprenyl diphosphate synthases^ 

FPP_ FPP_ FPP_ FPP_ FPP_ RPP_ GGPP_ GGPP_ GGPP_ GGPP_ GGPP_ GGPP_ HPP_ 



Protein HUM 


RAT 


CHI 


YSC 


ECO 


BST 


CAN 


MTH 


EHE 


EUR 


RCA 


NCR 


YSC 


FPP_HUM 


84.1 


68.0 


43.0 


23.1 


22.3 


17.6 


21.2 


18.9 


18.4 


18.4 


12.6 


16.1 


FPP_RAT 




66.0 


46.0 


24.8 


22.7 


17.6 


23.0 


19.5 


18.4 


19.2 


13.1 


17.0 


FPP_CHI 






46.5 


22.0 


24.9 


18.7 


23.5 


18.4 


18.6 


19.8 


12.9 


16.3 


FPP_YSC 








21.2 


22.6 


20.7 


23.9 


19.4 


15.7 


20.2 


10.3 


17.0 


FPP.ECO 










42.9 


34.7 


30.8 


31.7 


31.9 


29.4 


15.8 


23.6 


FPP_BST 












39.6 


33.4 


33.6 


33.6 


31.3 


12.9 


25.6 


GGPP_CAN 














28.7 


27.1 


27.1 


25.8 


12.5 


20.4 


GOPP_MTH 
















25.7 


26.6 


29.9 


16.1 


22.7 


GGPP_EHE 


















51.8 


26,4 


11.8 


21.8 


GOPP_EUR 




















25.6 


J1.3 


22.5 


GGPP„RCA 






















13.9 


20.6 


GGPP_NCR 
























10.7 



HPP-YSC 



" Pairwise sequence comparisons were done by TREE of Feng and Doolittle (1987). The percent identity was based on the aligned regions. 

^ FPP^HUM, Homo sapiens FPPSase; FPP.RAT, Rattus rattus FPPSase; FPP.CHI, Gallus gallus FPPSase; FPP_YSC, yeast Saccharo- 
mycescerevisiae FPPSase; FPP.ECO, Escherichia co// FPPSase; FPP^BST. Bacillus stearotkermophilus FPPSase; COPP_CAN, Capsicum an- 
nuum GGPPSase; GGPP_MTH» Methanobacterium thermoautotrophicum GGPPSase; GGPP_EHE. Erwinia herbicola GGPPSase; 
GGPP^EUR. Erwinia uredovora GGPPSase; GGPP^RCA, Rhodobacter capsulatus GGPPSase; GGPP_NCR, Neurospora crassa GGPPSase; 
HPP_YSC yeast S. cerevisiae HexPPSase. 
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A, Chen et aL 



FPP HUM 
FPP'RAT 
FPP'CHI M 

Fpp'ysc 

FPP'ECO 

FPP'BSr 

GGPP CAN M 

GCPP^MTH 

GCPP~EHE 

GGPP'EUR 

GCPP'RCA 

GGPP'NCR M 

HPP_ySC M 

Consensus 



HNGDQNSDVYAQEKQOFVQHFSQIVRVLTBOEMGMPEIGDAZARLKEVLEYNA IGGKnmOLTVWAPREZ.VEFRKQO AOSLQRAWTVONCVEL 94 
MNGDQKLDVHNOEKQNFIOHFSQIVKVLTEDELGHPEKGDAITRIKEVLEYNT VGGKyHRCLTWOTFQELVEPRKQD AESLORALTVCUCVEL 94 
in) LSPVWBREREEFVGFFPQZVRDLrEDGIGHPEVGDAVARLKEVLOYNA FCGKCVRaLrWAAYREI^SGPGOXD ABSLRCALAVGNCIEL 108 
MASEKEIRRERFLNVFPKLVEBLNASLLAYGMPKEACDHYAHSLNYNT PGGKLMRQLSWDTYAILSNKTVEOLGOEEYEKVAILGRCIEL 91 



MDFPQQLEACVKQANQALSRF I AP LPFQNTP WETMQYGALLGGKSaRVFl.VYATGHHFG 
KAOLSVEQF LNEQKQAVETALSRY I ERLBGPAKLKKAMAY5LEAGGKRIRPLLLLSTVRALG 
(63) ERIBAAQTEEPFNFKIYVTEKAISVNKALOEAIIVKEPHVIHEAMRYSLLAGGKRVRPKLCLAACBLVG 
HTEVLDILRKYSEVADKRIHECISDITPDTLLXASEHLITAGGKXXRffSLALLSCEAVG 
HV5G5KAGVSPHRE I BVMR05I DDHLAGLLPETDSQDIVSLAMREGVHAPGKRIRPLLMLtAARDIJty 
NTVCAKXHVHLrROAAEQLLADZDRRLOOLLPVEGEROWGAAMREGALAPGKRZRPm*LLLTARDLGC 
NS LDKR I ESA LVKA LSPEALGESFPLLAAALPYGVFPGGARIRPTI LVSVALACG 
(87) FSPYTMAPOPPOPPPNPDRFATEDFFSPSRRTWSEEKEKVLTGPYDYLNGHPGKDIRSQMVKAFDAHLO 
(29) AASKLVTPKILimMPISLVSKEMNTtAKNIVAtIGSGHPVLNKVTSYYFETEG)aCVRPLLVLLLSRALS (70) 

GK. . 



VSTNTLDAPAAAVEC 75 
KDPAVGLPVACAIEH 77 
GNOENAMAAACAVEK 148 

cnpedaagvaaaibl 74 
qgsmptllolacjivel 84 
avshoglu>uu:avem 85 

DDCPAVTDAAAV&LBL 71 
VPSBSLEVITKVISM 172 
GILPKQRRIAEIVEM 184 
E. 



FPP HUIJ 

FPP'RAT 

FPP'CHI 

FPP'YSC 

FPP'ECO 

FPP~BST 

CCP?_CAN 

GGPP MTH 

GGPP"EHE 

GGPP EUR 

GGPP*RCA 

GCPP"NCR 

HP? YSC 



FPP HUM 
FPP~RAT 
FPP'CHI 
FPP"YSC 
FPP~ECO 
FPP'BST 
GGP? CAN 
GGPP "MTH 

ggpp"ehe 

GGPP~EUR 
GGPP'RCA 
GGPP"NCR 
HPP YSC 



LQAFTLVADOI MDSSLTRRGQTCWTOKPGVCLDAIMDANLLBACIYItLLKL YCRBOPYYtNLIELFLOSSYQIEI GQTLDLLTAPQGNVDLVR 197 
LOWrmODI HDSSYTRRGOICWrOKPGICLDAITOALLIiAAITRLLKF YCREOPYYLNLLELFLQSSYOTEI GQTLDLl TAPQGQVDLGR 187 
FOWTLVADDI MOOSLTRRGOLCTfTKKEGVCLDXIBDSFLLBSSmVIAK YCRORPYYVHLLELFLQTAYOTEL GQM LDLITAPVSKVDLSH 201 
LOAYTLVADON MDKSITRRGQPCWrKVPEVCBIAIHDAFMLBAAITKLLKS HFRNEKYYIDITELFHEVTFQTEL GQlMoltikPkbk^tlsrt 184 
IHAYSLIHDDLPAHDODDLRRGLPTCHVKFGBAMAIZJIGDALOTUFSILSDADHPEVSDRDRISMISELASASGIAG MCGGQALDLDAEGKH VPLDA 173 
IHTYOLIHODLPSMDNOOLRRGKPTNHKVFGlAMAlLAGDGLLTYAfQLITEIDDERIPPSVRLRLIERLAKAAGPEG MVAGOAAOKEGEGKT LTLSE 175 
JBTMSUHDDLPCHDNDDtRRGKFTNRKIYGSDV&VLAGDSLLAFArEHIVNSTAGVTPSRI VGAVAELAKSIGTBG LVAGOVAOIKCTGNASVSLET 246 
IBTF8LIHDDI MDDDEMRRGEPSVHVIWGBPMAXLAGDVLFSKATEAVIRNGO 5ERVKDALAWVD5CVK ICEGQALDMGFEERLDVTEDE 165 
TBTA8LMLODMPCMDNAELRRGOPTTRKKFGB5VAILA5VGLLS)(AFGLIAATG0 LPGERRAQAVNELSTAVGVQG LVLGQFRDLN DAALDRTPDA 180 
VHAA8LXLDDHPCMDDAKLRRGRPTIBSHYGBHVXILAAVALLSKAFGVIADADG LTPLAKNRAVSEtSNAlGMQG LVOGQFKOtS EGDKPR5AEA 181 
MBCASLVHDDLPAFDNADIRRGKPSLBKAYNSPLAVIAGDSLLIRGrEVLADVGA VNPDRALKLISKLGOLSGARGGICAGQANE 5ESKVD 162 
LBTAaLLVDDV EONSVLRRGFPVAHSIFCIPQTINTSNYVYFYALOBLOKLKMPKAVSIFSEELLN LHRGOGMDLFWRDTLTCPTED 259 
IHTASLLHDDV IDHSDTRRGRPSGNAAFTNKMAVLAGOFLLGRATVSISRLHNPEWELMSNSIANLVE (33) KEHOFRVPSROQGLQLSHOOIIE 310 
L, .00 D RRG 



PTEKRYKSIVKYKTAFYSFYUPIAAAMYMAGIDGEKEHANAKKILLEMGEFFQIODDYLDLFCDPSVTGK IGTDIQDNKCSHLWOCLORATPEQYQIL 286 

YTEKRYKSIVKYKTAFYSFYLPIAAAMYMAGIDGEKEHANALKILUEMGEFFQIQDDYLDLFGDPSVTGK VCTDIQDNKCSKLWOCLLRATPQQRQIL 286 

FSEERYKAIVKYKTA FYSFYLPVAAAMYMVGIDSKEEHENAKAILLBHGBYFQIQDOYLDCFGOPALTGK VCTDIODNKCSWLWQCLQRVTPEQROLL 300 

FSLKKHSFIVTFKTAYYSFYLPVALAMYVAGITDEKDLKOARDVLIPLGEYFQIQDDYLDCFGTPEQIGK IGTDIQDHKCSWVIHKALELASAEORKTL 283 

RIHRHKTGAL IRAAVRLGALSAGDKGRRALPVLDKYAESIGLAFQVQDOILPWGDTATLGKRQGADQOLGKSTYPALLGtEQARKKARDLI 267 

YIHRHKTCKM LQYSVHAGAtlGGADARQT RELDEFAAHLGLAFQIROOILDIBGAEEKIGKFVGSOQSNNKATYPALLSLAGAKEKLAFHI 268 

FIHVHKTAAL LESSWLGAILGGG TNVEVEXLRRFARCIGLLFOWDDILDVTKSSBELCKTAGKDtWDKTTYPKLLGLEKAKEPAABLIf 339 

HIYR KTAAL lAAATKAGAIHGGASER EVEALEDYGKFIGLAFOIHDDYLOWSDEBSLGKPVGSDIAEGXMTLMWKALEEASEEDRERL 258 

STNHLKTGIL FSAMLQIVAIASASSPSTR ETLHAFALOFGOAFQLLDOLRDDHPET GKORNKD AGKSTLVNRLGADAARQKLREHI 268 

MTNHFKT5TL FCASH^SIVANASSEAR OCLHRFSLDLGQAFOLLDDLTDGMTDT GKDSNQD AGK5TLVNLLGPRAVEERLRQHL 269 

AYHQAXTGAL FZAATQMGAIAAGYEAEPIirFD LGHRIGSAFQIADOLKDALNSAEAHGKPAGQDIANERPNAVKTMGIEGARKHLOOVL 2 52 

EHV5KKTGGL FRLGIKLMQAE5RSPVDCVP LVNXIGLIFQIADDYHNLNNREYTANKGHCBDLTBGKFSFPVIHSIRSKPSNHQLLN 349 

TAFEYYIHKTYLKTAAL ISKSCRCAAILSGASPAVI OECYDFGRNLGICFQLVODMLOFTVSGKDLGKPSGADLKLGIATAPVLFAIfKEDPSLGPLIS 408 

KT G..FQ>.DD..D GK 0 K 

TV — ^ ir^ 



LB 
LB 
LB 

YHE 

IL 

IL 

LA 

DYL 



FPP HUM KENYGQKEAEKVARVKALYEBLDLPAVFLQYEEDSYSHIMALIEOYAAP LPPAVFLGLARKIYKRRK 
FPP_RAT EENYGOKDPBKVARVKALYEELDLRSVFFKYEEDSYNRLKSLIEQCSAP LPPSIFLELANKI YKRRK 
FPP CHI E0NYCRKEPEKVAKVKELYEAVGMRAAF0QYEE5SYRRL0ELIEKH5KR LPKEIFLGLAQKIYKRQK 
FPP^YSC DBHYGKKDSVAEAKCKKIFNDLKIEQLYHBYBESIAKDLKAKISQVDBSRGFXAOVLTAFLNKVYKRSK 
FPP ECO DDARQSLKQLAEQ5LDTSALEALADYII0RKK 
FPP'BST EAAQRHLRNADVDGAAtAYICELVAARDH 
GGPP CAN RBAKQQLEGFDSRKAAPLIALADYIATRDN 
GCPP'MTH I5ILGSCDEG5VAEAIEIFERY GAT0YAHEVALDYVRMAXERLEILBD5DARDA LNHIADFVLEREH 
GCPP'EHE . DSADKHLrFACPQGGAIRQFMHLNFGHHLADNSPVMKXA 
CGPP~EUR QLASEHLSAACOHGHATOHFIQAWFOKKLAAVS 
GGPP'RCA AGAIASIPSCPGEAKLAOKVOLYAHKIMOZPASAERG 

GGPP'NCR ILKQKTG0EEVKRYAVAYME5TGSFEYTRKVIKVLVDRAROMTEDIDDGRGK5GGIKKILDRIMLHQEENVAQKNGKKE 
HPP YSC RNFSERGDVBKTIDSVRLHNGIAKTKILAEEYRDKALQNLRDSLPE5DARSALEFLTNSILTRRX 



353 
353 
367 
352 
299 
297 
369 
325 
307 
302 
289 
426 
473 



Fig. 2. Multiple sequence alignment for the 13 isoprenyl diphosphate synthases listed in Table 1. Long N-terminal sequences 
and insertions in HPP _ YSC arc omitted, but the numbers of amino acids are shown in parentheses. Consensus sequences shown 
below the 5 highly conserved sequence domains, I-V, are double underlined. A region clearly corresponding to domain III was 
not seen in HPP. YSC. Residues conserved differently in eukaryotic FPP synthases arc in bold. The peptide in chicken FPP 
synthase that was labeled during photoaffmity experiments is underlined. 



inclusion of the chromoplastic GGPPSase from C. annuum 
(green peppers) in a cluster of eubacterial farnesyl and geranyl- 
geranyl diphosphate synthases. These results indicate that the 
chain length selectivity of the short-chain prenyltransferases can- 
not be readily deduced from sequence comparisons and that as- 
signments of function should be made biochemically. 

Prediction of secondary structure 

Because the pairwise alignments indicate that the isoprenyl di- 
phosphate synthases have diverged from a common ancestor, 
it is reasonable to assume that the gross topological features of 
the ancestor were conserved during evolution. We initially com- 
pared the amino acid compositions of 1 1 prenyltransferases, all 
of the FPPSases and GOPPSascs except the highly diverged N, 
crassa enzyme, using Chou's approach (Chou, 1989) for predict- 



ing structural classes of proteins from their amino acid contents. 
The results are shown in Table 2. As judged by comparing the 
average amino acid composition between the isoprenyl diphos- 
phate synthases with those of representative all-a, all-jS, ot + /3» 
and a/p proteins, the prenyltransferases most closely resemble 
typical all-a or a/p structures, suggesting an all a-helix protein 
or a protein dominated by a-helices. 

A consensus secondary structure for the isoprenyl diphosphate 
synthases was predicted from a combination of the multiple se- 
quence alignments, probabilities for formation of loop, a-helix, 
and /3-sheet regions (Fig. 4), and an average hydropathy plot 
(Fig. 5). Predictions by Gamier-Osguthorpe-Robson (GOR) or 
Chou-Fasman (CF) methods were in good agreement and pre- 
dicted 8 a-helices and 4 short 0-sheets. The location of the a- 
helices and /8-sheets generally correlated well with the average 
hydropathy plot for the 1 1 amino acid sequences. Loops 1, 2. 



Downloaded from www.proteinscience.org on March 30, 2006 



Secondary structure of isoprenoid diphosphates 



603 



50 



92 



FPP.HUM 
FPP JIAT 
FPP_CHI 
FPP.YSC 
GGPPJ^CR 



50 



72 



GGPP_EHE 
CKjPP.EUR 
FPP.ECO 

- FPP_BST 

- GGPP_CAN 
GGPP.RCA 



58 



GGPP_MTH 



100 



HPP.YSC 



Fig. 3. A phylogenetic tree for isoprenyl diphosphate synthases con- 
structed from progressive alignments using the TREE program and re- 
fined as described by Feng and Doolittle (1990), 



3, 4, and 7 were consistently assigned by computer predicted 
turns, gaps in the alignments, and hydrophilic peaks in the hy- 
dropathy plot. A short iS-sheet was predicted within loop 3 and 
within 7 by GOR and CF algorithms. However, the hydropa- 
thy plots placed these "sheet" sequences in hydrophilic regions 
and cast doubt on their existence. The assignments for loops 5 
and 6 were based on large gaps that occurred in these regions 
and large negative hydropathy indices. The assignment for loop 
8 was based on large negative hydropathy indices in that region. 
There were also gaps in the alignment between a4 and a5; how- 
ever, this region was not hydrophilic, and a turn motif was not 
predicted by GOR or CP, Perhaps these 2 helices are joined by 
a spacer of variable length or are fused into a single a-helix. In 
addition, no turns or loops were predicted between pi and a6 
or a 7 and j93. Because the average amino acid composition pre- 
dicted a structure primarily composed of a-helices, the short ^2 
and i33 regions may be helical extensions of a 6 and a7, resp>ec- 
tively, rather than jS-sheets. 

A predicted average secondary structure for the isoprenyl di- 
phosphate synthases is presented in Figure 6. The high a-helix 
content in the structure is consistent with the statistical predic- 
tion based on amino acid composition. The secondary structural 
elements were arranged to place the 5 regions containing highly 
conserved sequences together on the same face of the structure. 
Although the 3-dimensional fold is not known for any prenyl- 
transferase, one might imagine an antiparallel orientation of a2, 
a3, and a4/a5 that allows loops 3, 5, and 7 to be brought to- 
gether. Additional support for this folding pattern is discussed 
in the next section. Because the consensus structure was con- 
structed from homologous core sequences, individual enzymes 
may contain some additional elements of secondary structure 
that lay outside of the predicted consensus regions. Likewise, 
the lengths of some secondary structural elements, loops, and 
spacers undoubtedly vary from protein to protein. 



A model for substrate binding 

The predicted consensus structure, along with other informa- 
tion about catalytic site residues, can serve as a guide for locat- 



Table 2. Comparison of average amino acid compositions 
of isoprenyl diphosphate synthases and (he 4 protein classes^ 



Amino acid 


Synthases'* 


All-a 


AII-/J 


a + i? 


a/0 


Ala 


11.2 


U.6 


7.3 


9.3 


8.3 


Arg 


5.1 


2.2 


2.4 


4.1 


3.4 


Asn 


2.5 


4.0 


5.0 


6.4 


4.2 


Asp 


7.0 


6.7 


4.4 


5.9 


5.6 


Cys 


1.6 


0.9 


2.7 


3.9 


1.5 


Gin 


4.6 


2.7 


4.4 


3.9 


2.6 


Glu 


7.3 


5.5 


3.1 


4.6 


5.9 


Gly 


6.7 


8.1 


10.7 


9.1 


8.7 


His 


2.3 


4.5 


1.8 


1.7 


2.5 


He 


5.5 


3.7 


4,3 


4.9 


5.5 


Leu 


11.7 


9.0 


6.4 


5.8 


7.8 


Lys 


6.1 


10.2 


4.1 


5.9 


7.4 


Met 


2.6 


2.0 


0.6 


1.3 


2.1 


Phe 


3.3 


5.0 


3.1 


2.8 


3.6 


Pro 


3.5 


3.4 


4,6 


3.8 


4.3 


Scr 


3.1 


5.0 


12.3 


6.7 


7.5 


Thr 


3.9 


4.9 


9.1 


6.2 


5.5 


Trp 


0.6 


1.3 


1.6 


1.6 


1,7 


Tyr 


3.3 


2.6 


4.0 


5.7 


3.0 


Val 


6.0 


6.8 


8.2 


6.5 


8.7 


Difference index*^ 




27,4 


49.8 


33.2 


28.3 



' Values for protein class all-a, all-/?, a + j3, and a/j3 are taken from 
Chou (1989). 

^ Average amino add compositions of all FPP synthases and GGPP 
synthases (not including GGPP„NCR). 

^ Difference index = S|Ca, - Cb,| as described in Methods. 



ing putative binding sites for the substrates. The 5 highly 
conserved regions identified in Figure 2 are located in the sec- 
ondary structure as follows (see Figs. 4, 6): Region I, from loop 
1 to the N-terminal part of ^1; Region II, from the C-terminal 
half of a 2 to the N-terminal half of loop 3; Region III, a 5; Re- 
gion IV, the C-terminus of loop 5; and Region V, from the 
C-terminus of a7 through loop 7. Photoaffinity experiments 
with an azido analog of IPP (Brems et al., 1981) labeled several 
amino acids from positions 157 to 188 in L5 of avian FPPSase, 
suggesting that this part of the enzyme interacts with the hydro- 
phobic isopentenyl moiety in IPP. Recently, Blanchard and 
Karst (1993) discovered that a mutation at KI97 near the 
C-terminus of L5 in yeast FPPSase both reduced the activity and 
altered chain length selectivity of the enzyme. K 197 is located 
just beyond the region labeled in the photoaffinity studies with 
avian protein. These results suggest that much of L5 forms an 
integral part of the IPP pocket with the C-terminal end of the 
loop extending to the binding site for the allylic substrate. 

The highly conserved DDXX(XX)D motifs in L3 and L7, as 
well as the arginine doublet in L3, are likely candidates for di- 
phosphate binding sites. These predictions (Ashby et al., 1990) 
are consistent with site-directed mutagenesis experiments (Joly 
& Edwards, 1993; Song & Poulter. 1994) that established that 
all of these residues except the last aspartate in L7 were essen- 
tial for catalysis. Which substrate binds to which aspartate-rich 
region is not known. Ashby et al. (1990) suggested that the 
DDXXD motif in L7 is the allylic binding site on the basis of 
sequence comparisons with prenyltransf erases that utilize non- 
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A, Chenet ai 



Position 

FPP HUM 

FPP'RAT 

FPP"CHI 

FPP'YSC 

FPP~ECO 

FPP~BST 

GGPPCAN 

GGPP MTH 

ggpp"ehe 

GCPP~EUR 
GGPP RCA 



Gap 
GOR 
CF 

Prediction 



10 20 30 40 SO 60 

I I I I t I 

VQHFSQIVRSaTEDEMGHPEIGDAIARLKEVLEYNA IGGKYNRGLTWVAFRELVEPRKOD 
lOHFSQIVKVLTEDELGHPEKGDAITRIKEVLEVNT VGGKYNRGLTWQTFOELVEPRKQD 
VGFFPQIVRDLTEDCIGHFEVGDAVARLKEVLOyNA PGGKCNRGLTWAAYRSLSGPGQKD 



70 80 90 

I I I 

ADSLQRAWTVGWC VELLQAFFLVADO I 
AESLQRALTVGWCVELLQAFPLVLDDI 
ABSLRCALAVGWC I ELFOAFFLVADD I 



LNVFPKLVEELNASLLAYGMPKBACOWYAHSLNYNT PGGKLNRGLSWDTYAILSNKTVEOLGQEEYEKVAI WWCIELLQAYFLVADDM 



100 110 
I I 
MDSSLTRRGQTCWYQKP 
MDS3YTRRGQICHYQKP 
MDOSLTRRGOLCHYKKB 
MOKSITRRGQPCHYKVP 



QQLEACVKQANOALSRFIAPLPFQNTPWETKOYGALLGGKRLRPFLVYATGHMFG 
COFLNEQKQAVETALSRYIERLEGPAKLKKAMAYSLEAGGKRIRPLLLLSTVRALG 
FKIYVTEKAI5VNKALDEAIIVKEPHVIHEAMRY5LLAGGKRVRPHLCLAACELVG 
VLDILRKYSEVADKRIHECISDITPDTLLKASEHLZTAGGKKXRPSLALLSCEAVG 
HREIEVMRQSIODHUVGLLPETDSQDIVSLAMREGVMAPGKRZRPLLMLLAAROLRY 
RDAAEOLLAD I DRRLDQLLP VEGERD WGAAMRBGALAPGKR IRPHLLLLTARDLGC 
MSLDKRIESALVKALSPEALGESPPLLAAALPYGVFP GGARIR PTILVSVALACG 

I 

I I I II I I II I 11 I III II I II I I 

> >>>>>>>>> >> 

HHHHHHHHHHHHHHHHHH BHHHHHHHHHHHHHHHH TT BBBBBBBBBBBB BBRHHHHHHHHHHHHHHHHHHHHHH HHHHHH TTTTTBBBBT 

HKHHHHHHHHHHHHHHHHHHHKHHHHHHHHHH TT BBBBBB T T HHHHHHHHHHHHHHHHHTTH HT T TTT TTBB 
I 02 - I I -W— I I pi I I L2 t I a2 \ ( L3- I 



VSTNTLDAPAAAVECIHAYSLIHDDLPAMOOOOLRRGLPTCHVKF 
KDPAVGLPVACAIEMIUTY5LIHD0LPSMDNDDLRRGKPTNHKVF 
GNQENAMAAACAVEMIHTNSLIHDDLPCMONDDLRRGKPTNHKIY 
GNPEDAAGVAAAIELIHTFSLIHDDI HDODEMRRGEPSVHVIN 
QGSMPTLLDUVCAVELTHTA5LMLDDHPCHDNAELRRGQPTTHXKF 
AVSHDGLLDLACAVEMVHAASLILODNPCMDDAKLRRGRPTIHSHY 
DDCPAVTDAAAVAL ELMHCASLVHDDLPAFDNADIRRG KPSLHKAY 

11 I III II I II 



FPP_HUM 

FPP RAT 

FPP"CHI 

FPP YSC 

FPP"ECO 

FPP"BST 

GCPPCAN 

GGPP_MTH 

GGPP EHE 

GGPP"EUR 

GGPP_RCA 

I 

Gap 
GOR 
CF 

Prediction 



120 130 
I I 
GVGLDAINDANLLEACIYRLLKL 
CIGLDAINDALLLEAAIYRLLKF 
GVGLOAINDSFLLESSVYRVLKK 
EVGEIAINDAFMLEAAIYKLLKS 



140 ISO 160 

I I I 

YCREOPYYLNLIELFLQSSYQTEI 
YCREQPYYLNLLELFLQSSYQTEI 
YCRQRPYYVHLLELFLQTAYOTEL 
HFRNEKYYIDITELFHEVTFQTEL 



GEANAILAGDALQTLAF5ILSDA0MPEVSDRDRISMI5ELASASGIAG MCGGQALDLDAEGKH VPLDA L£ 
GCAMAILAGDGLLTYAFQLITEIDOERIPPSVRLRLIERLAKAAGPEG MVAGQAADHEGEGKT LTLSE LE 
GEDVAVLAGDSLLAFAFEHIVNSTAGVTPSRI VGAVAELAKSIGTEG LVAGQVADIKCTGNASVSLET LE 
GEPMAILAGDVLFSKAFEAVIRNGD SERVKDALAVWDSCVK ICECQALDMGFEERLDVTEDEYME 
GESVAILASVGLLSXAFGLIAATGD LPGERRAQAVNELSTAVGVQG LVLGQFRDLN DAALORTPDAIL 
GEHVAIU^VALLSKAFGVTADADG LTPLAKNRAVSELSNAIGMQG LVQGQFKDLS EGDKPRSAEAIL 



NEPLAVLAGDSLLIRGFEVLADV6A VNPDRALKLISKLGQLSGARGGIC AGQ AWESESKVD 



I I 



I 11 II II II 

>>>>> > >>>> >> > > 
HHHHHHHHHKHHNHHHHHHHH BBHHHRHHHHHH B BHHBBB HHH 
HHHHHHH HHHHHHHHHHHH TT TT HHHHHHHHHH T H HHHHHHH HTH 
I a3 I I L« It 04 1 I— 8— ||-a5— I I 



170 180 190 200 210 220 

t I t I I r 

GQTLDLLTAPQGNVDLVRFTEKRYKSIVKYKTAFYSFYLPIAAAMYMAGIOGEKEHAN 
GQTLDLITAPQGOVDLGRYTEKRYKSIVKYKTArYSFYLPIAAAMYMAGIDGEKEHAN 
GO MLDLITAPVSKVDLSHFSEERYKAIVKYKTA FYSFYLPVAAAHYMVCIDSKEEHEN 
GOLMOLITAPEDKVOLSKFSLKKHSFIVTFKTAYYSFYLPVALAMYVAGITDEKDLKO 
RIHRHKTGAL IRAAVRLCALSAGDKGRRALPV 
YIHRHKTGKM LQY5VHACALIGCA0ARQT RE 
FIHVHKTAAL LESSVVUIAILGGG TNVEVCK 
MIYK KTAAL I AAATKAGAIMGGASER EVEA 
STNHLKTGIL FSAMLQI VAIASASSPSTR ET 
MTNRFKTSTL FCASMQMAS IVAN AS SEAR DC 
AYHQAKTGAL FIAATQMCAIAAGYEAEPWFO 
IV 

I I I I I I III 
> >>>>> > > > > > > 
H HHHH B BE HHHHHHHHHHH HH H H 
H BB BBHHHHHHHHH TT HHH 
L5 ll-p 21— 0< tl M— 



iGQAWE S 
iTl" 
II I 



LA 



230 240 2S0 260 270 280 290 300 310 

II I I t I I I I 

FPP HUH AKKILLEMGEFFQIQOOYLDLFCDPSVTGK IGTDIQDNKCSWLVVQCLQRATPEQYQILKENYGQKEAEKVARVKALYEELOLPAVFLQ 316 

FPP'RAT ALKILLEMGEFFQIQDDYLDLFGDPSVTGK VGTDIQDNKCSWLWQCLLRATPQQRQILEENYCQKDPEKVARVKALYEELOLRSVFFK 316 

FPP'CKI AKAILLCMGEYFQIODDYLDCFGDPALTGK VGTDIODNKCSWLWOCLQRVTPEQRQLLEONYGRKEPEKVAKVKELYEAVGNRAAFQQ 330 

FPP'YSC AROVLIPLGEYFOIODDYLDCFGTPEQIGK IGTDIQDNKCSHVINKALELASAEQRKTLDENYGKKDSVAEAKCKKIFNDLKIEQLYHE 313 

FPP'ECO LDKYAESIGLAFOVQODILDVVGOTATLGKRQGAOOOLGKSTYPALLGLEQARKKARDLIDDARQSLKQLAEQSLOTSALEALADYIIOR 297 

FPP'BST LOEFAAHLGLAFQIRDDILDIEGAEEKIGKPVGSDQSNNKATYPALLSLAGAKEKLAFHIEAAQRHLRNADVDGAALAYICELVAARDH 297 

GGPF can LRRFARCIGLLFQWDDILOVTKSSEELGKTAGXDLWDKTTYPKLLGLEKAKEFAAELNREAKOQLECFDSRKAAPLIALAOYIAYRON 369 

GGPP~MTH LEDYGKFIGLAFOIHDDYLDVVSDEESLGKPVGSDIAEGKHTLMWKALEEASEEDRERLISILGSGOECSVAEAIEIFERYGATQYAHE 288 

GGPP EHE LHAFALDFGQAFQLLDOLROOHPET GKDRNKD AGKSTLVNRLGADAARQKLREHIDSADKHLTFACPOGGAIRQFMHLWFGHHLA 298 

GGPP'EUR LHRFSLDLGQAFOLLODLTDGHTDT GKDSNQD AGKSTLVNLLGPRAVEERLRQHLQLASEHLSAACQHGHATOHFIOAHFDKKLA 299 

GGPP_RCA LGMRI GSAFQIADDLKPALMSAEAMGKPAGQDIANERP NAVKTMGIEGARKHLQDVLAGAIASIPSCPGEAKLAQMVQLYAHKIMDI 282 

V 

t I I I II I I I III I 

Gap »> >» > » 

GOR HHHHHHHHHBBBTT BBBT HH BBB TTTT BBBBKHKHHHHHHHHHHHHH HHHHHHHHHHHHHHHHHHHHHHHKH 
CF HHHHH HBBBBBTTBBBBB TT TTTT TTTTBBBBBB HHHHHHHHHHHHHHHHHH HHHHHHHHHHHHHHHHHHHH 
Prediction --I |--a7--|p3-| I L7 1 1-^4-1 I 13 -I o8 1 



Fig. 4, A predicted consensus secondary structure based on multiple sequence alignments. The alignment of FPP synthases and 
GGPP synthases (not including GGPP- NCR) is the same as in Figure 2 except the N- and C-termini are omitted. The num- 
bers on top of sequences show the alignment positions, whereas the numbers on the right are the actual amino acid numbers. 
The 5 conserved domains are double underlined, and the affinity labeled peptide in chicken FPP synthase is underlined. 1» hy- 
drophobic residues; CF, consensus secondary structure prediction by Chou and Fasman; GOR» consensus secondary structure 
prediction by GOR. The inserted gaps in the alignment are also marked by >. H and a, a-helix; B and jS, /3-sheet; L, loop; S. 
spacer; T, turn. 



isoprenoid acceptors instead of IFF; however, except for the 3 
aspartates, the overall sequence homologies in this region were 
low. A hellcat wheel projection of a 2 and a3 indicates that a 
substantial portion of the total exposed surface area of these he- 
lices is hydrophobic. An alternative model for substrate bind- 
ing has the diphosphate moiety in the allylic substrate interacting 
with the DDXX(XX)D motif in L3 instead of L7, with al and 
a 3 facilitating binding of the hydrophobic isoprenoid tail 
through hydrophobic interactions. In this scenario, the diphos- 
phate residue in IFF binds to the DDXXD region in L7 with the 
hydrophobic isopentenyl moiety in the region of the active site 
bounded by most of L5, which was labeled with the IFF pho- 
toaffinity analog. 

All of the isoprenyl diphosphate synthases except the 
GGFFSases from Erwinia and Rhodobacter have charged side 



chains in 2 of the final 3 C-terminal residues. Amino acids con- 
taining positively charged side chains appear in the first and third 
positions in most of the enzymes. Site-directed mutagenesis of 
R350 in Saccharomyces cerevisiae FFFSase had little effect on 
the catalytic constants for the enzyme (Song & Poulter, 1994). 
However, fusion of a negatively charged EEF a-tubulin C-terminal 
epitope to the wild-type sequence reduced V^m 12-fold and was 
accompanied by a l4-foId increase in for IFF. Laskovics 
and Poulter (1981) measured the individual kinetic constants for 
avian FFFSase and found that the rates of addition of substrates 
were substantially below the diffusion-controlled limits. These 
results are consistent with a conformational change in FFFSase 
upon binding of substrates* Thus, the C-terminus of the enzyme 
may form a flexible flap that helps seal the active site during 
catalysis. 
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Secondary structure of isoprenoid diphosphates 
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fig. 5. An average hydropathy plot for isoprenyl diphosphate synthases. 
Hydropathy indices of FPP synthases and GCPP synthases (not includ- 
ing GGPP_NCR) were averaged at homologous positions according to 
the alignment shown in Figure 4. The average index is plotted along the 
alignment positions. Predicted a -helix and ;3-sheet structures are shown 
below the plot. 



Discussion 

Isoprenyl diphosphate synthases catalyze the basic chain elon- 
gation steps in the isoprenoid biosynthetic pathway. These re- 
actions are ubiquitous in nature. Organisms contain 2 classes of 
isoprenyl diphosphate synthases, one for synthesis of short-chain 
C10-C20 molecules and another for longer chain isoprenoids. 
The short-chain enzymes are further subdivided into specific en- 
zymes for synthesis of geranyU farnesyl» and geranylgeranyl 
diphosphate. The long-chain prenyltransferases are also subdi- 
vided by chain length selectivity and, in addition, specifically 
form either cis or trans double bonds in the newly added iso- 
prene units. Amino acid sequences are now available for several 
short-chain FPPSases and GGPPSases and for 1 dW-trans long- 
chain synthase. Comparisons of the primary sequences for 13 
isoprenyl diphosphate synthases shown in Figure 2 revealed 5 
regions containing 2-10 highly conserved amino acids. Analy- 
sis of multiply aligned sequences for the 11 FPPSases and 
GGPPases shown in Figure 4. in conjunction with predictions 
of secondary structures, indicated that these enzymes are all 
Qf-helix proteins or a/^ structures dominated by c^-helices. 

Simple inspection of the aligned sequences suggested that in- 
dividual members of the family diverged from a common an- 
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cestral isoprenyl diphosphate synthase during evolution (James 
et al., 1978; Bajaj & Blundell, 1984; Chothia & Lesk, 1986). A 
more quantitative analysis using the methods of Feng and Doo- 
little (1990) supported this hypothesis and provided significant 
insights into the pathway by which they evolved. The earliest 
branch was a functional segregation that separated the long- 
chain from the short-chain synthases, as illustrated by the large 
divergence between the FPPSase and the HexPPSase in yeast. 

The second major branching evident in the phylogenetic tree 
presented in Figure 3 segregates the short-chain length prenyl- 
transferases into 2 clusters regardless of chain length, one for 
eubacterial and archaebacterial proteins, and another for eu- 
karyotic enzymes. Many organisms have distinct enzymes for 
synthesis of C10-C20 isoprenyl diphosphates when these com- 
pounds serve as substrates for other enzymes. Thus, one might 
have anticipated a primary clustering for the short-chain en- 
zymes according to chain length rather than kingdom. However, 
M, thermoautotrophicum, a methanogenic archaebacterium, 
has a single bifunctional short-chain prenyltransf erase that pro- 
vides both the Ci5 precursor for synthesis of squalene and the 
C20 precursor for synthesis of the distinctive isoprenoid glyceryl 
ether core membrane lipids found in members of the archae 
kingdom (Chen & Poulter, 1993). Thus, the archaebacterial en- 
zyme may represent a primitive scenario where a single enzyme 
was responsible for short-chain synthesis. In this case, the fine 
tuning of chain length control would have evolved independently 
after eukaryotes and eubacteria diverged. Additional examples 
of eukaryotic GGPP synthases should help clarify this point. 
The single exception to the clustering pattern for the short-chain 
syntheses is the chromoplastic GGPPSase from peppers, where 
the gene for the enzyme may have been captured from an an- 
cient bacterial symbiote. 

It is unclear what mechanism regulates how many molecules 
of IPP are added to the growing isoprenoid chain by a prenyl- 
transferase, and there appear to be no clues from the amino acid 
sequences shown in Figure 2. This question will not be resolved 
until more sequence information is available or X-ray structures 
are obtained for prenyltransferases with different chain -length 
selectivities. 

Although the correlations we discovered provide important 
clues about thie evolution of isoprenyl diphosphate synthases. 




Fig. 6. The predicted average secondary 
structure of FPP synthases and GGPP 
.synthases. a-Helices (a 1-8) and /8-sheets 
(3 1-4) are drawn in rectangles and arrows, 
respectively. Loops (LI -8) are shown as 
curved lines. Each secondary structural 
unit is numbered by its position in the 
alignment shown in Figure 4 (see text for 
alternative views on a4-7 and ^2-3). The 
5 conserved domains (shaded) are labeled 
I-V. The drawing is not to scale. 
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the phylogenetic tree is not complete. There are no sequences 
yet reported for a long-chain cis double-bond synthase, and 
more examples of eukaryotic GGPP synthases are needed to 
confirm the groupings we propose. With the high level of ac- 
tivity in this area at present* these gaps should be filled in the 
near future. 

Methods 

The protein sequences of all isoprenyl diphosphate synthases ex- 
cept FPP-BST (Koyama et ah. 1993) and FPP_CHI (Kroon 
et al., unpubl.) were retrieved from the Swiss-Prot data bank 
using GCG programs (University of Wisconsin Genetics Com- 
puter Group). The TREE program (Feng & Doolittle> 1990) was 
used for pairwise comparisons, to perform multiple sequence 
alignments, and to construct a phylogenetic tree. Refmements 
in the tree were made according to the protocols described by 
Feng and Doolittle (1990). 

Average amino acid compositions of 6 FPP synthases and 6 
GGPP synthases (all except GGPP_NCR) were calculated 
using a spreadsheet. The difference index between the average 
amino acid compositions of the isoprenyl diphosphate synthases 
and those of all-a, all-/3, ot-^fi, and a/0 proteins were the sum 
of composition differences for each amino acid, S |Cai - Cb/I 
(Chou. 1989; Doolittle, 1992). The smaller the difference index, 
the closer the comparison. 

To predict an average secondary structure for FPPSases and 
GGPPSases, a secondary structure was calculated for each pro- 
tein by the GOR procedure (Garner et al., 1978) using GARNER 
in PCGENE and the CF method (Chou & Fasman, 1974) using 
PEPTIDESTRUCTURE in GCG. The secondary structures 
were then arranged according to a multiple alignment truncated 
at both N- and C-termini. Consensus assignments of a-helix, 
jS-sheet, or turn structures to each alignment position were de- 
termined when the assigned structural feature appeared in more 
than half of the aligned sequences, except at positions where 
gaps were inserted. 

The multiple sequence alignment itself may provide informa- 
tion about turns and surface loops. Regions where gaps were 
inserted were normally considered as surface loops to accom- 
modate insertion or deletion of a few amino acids. Additional 
information about structure came from hydropathy plots where 
regions of high hydrophobicity usually correlate with a buried 
i3-sheet or a hydrophobic a-helix, whereas regions of high hy- 
drophilicity usually correlate with a surface loop or a turn. An 
amphipathic a-helix normally occurs where there is no high or 
low peak in hydropathy plot. Hydropathy indices were calcu- 
lated for each prenyltransferase by the method of Kyte and Doo- 
little (1982) using PEPTIDESTRUCTURE. These values were 
averaged to calculate a hydropathy index at the corresponding 
positions in the multiple alignment. The average indices were 
plotted along the alignment positions. The consensus second- 
ary structure was predicted by combining GOR and CF struc- 
tures, consideration of gaps in the alignment, and comparisons 
with the average hydropathy plot. The hydrophobic nature of 
side chains at positions containing L, I, V, M, F, W, Y, A, or C 
in at least 9 of 1 1 sequences was also indicated as I (for interior) 
to help visualize the hydrophobicity of secondary structure units. 
Helical wheel projections were constructed by HELWHEEL in 
PCGENE to facilitate analysis of the surface of the a-helices in 
the consensus structure (Shiffer & Edmundson, 1967). 
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Exhaustive and Iterative Clustering of the 

Protein Databank 

byK. Kelly 

Abstract. The unique high-resolution protein chains in the September 1998 edition of the Brookhaven Protem Databank 
have been subjected to an exhaustive and iterative sequence- and structure-based clustering procedure to produce a 
database of alignments for homology identification and modeling. A novel feature of the procedure was the use of 
multiple-sequence, structure-based alignment to validate hypothesized clusters. The resulting database contains fewer than 
800 entries. Homology searches against this database, using rigorous sequence-to-group alignment, show high sensitivity 
and specificity when compared to existing methodologies. Four models were built based on families in the database and 
were submitted to the recently completed CASP3 competition. 

• hitroduction 

• Methods and Materials 

• Results and Discussion 

• References 

Introduction 

It is not yet possible to predict the 3D structure of an expressed protein from its amino acid sequence alone. Consequently, 
inferences about the structure and function of new protein sequences are generally drawn based upon comparisons to 
sequences for which there akeady exist experimental models. To date, the most useful principle which is applied to guide 
such searches is the rule that "similar sequence" implies "similar structure," where sequence similarity is understood to be 
determined by application of the well-established dynamic programming paradigm (Needleman and Wunsch). Generally 
speaking, if a protein sequence shares more than 25% pairwise similarity with a known structure, it usually also shares at 
least the broad outlines of the fold topology of the known structure. However, the Protein Databank contains many pairs of 
similar structures that are remote homologues with less - sometimes much less - than 25% pairwise sequence identity. 
Standard homology searching tools often have difficulty identifying such remote relationships with any confidence, and 
even when a relationship is identified, the correct alignment of the new sequence to the homologous structure can be 
difficult to judge. 

Researchers are actively working on strategies for improving the sensitivity and selectivity of homology searching as well 
as the accuracy of the alignments that are necessary to build homology models based on existing structures. Broadly 
speaking, one can divide such efforts into two classes. 

The first class comprises methods that seek to to enhance the effectiveness of pure sequence-based queries by taking 
advantage of multiple-sequence alignment information in the form of profiles from particular protein families. The use of 
"signatures" ~ short, amino acid sequence motifs -, judged to be characteristic of a family of related proteins, in the 
PRO SITE and BLOCKS databases, represent such an effort (Henikoff, et al), as does the use of profile analysis in the 
recently developed PSI-BLAST (Altschul et al). Recently, Hidden Markov Models (HMMs) have been used to represent 
the information contained in multiple-sequence alignments. For example, the Pfam database (Sonnhammer et al) contains 
HMMs that represent various protein domains. It has been observed that the success of HMMs as homology detection 
devices is highly dependent on the quality of the initial alignments used to generate them (Henikoff al). 
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The second class comprises efforts to develop "fold recognition" algorithms that make direct use of information about 
structural features in the targets. The term "threading" is often used to refer collectively to such methods, though it was 
first used to refer to the use of mean-force potentials of pairwise residue distances, as derived from databases of protein 
structures (Sippl). Alternatively, several related methods have been developed in the last few years which use statistical 
preferences of amino acids for certain structural environments to align an amino acid sequence to an environment string 
using dynamic programming. Such methods may also modify gap penalties to take advantage of knowledge of where 
insertions or deletions could realistically be made in a target structure. While some successes have been reported in 
identifying very remote homologues, a recent study (Jaroszewski et al) observed that structure-only based searching 
methods did not perform as well over a set of several fold recognition benchmarks as sequence-only based methods. The 
best results were observed usmg hybrid schemes with mixed sequence/structure scoring matrices. 

The goal of the MOE project described in this paper is to improve homology identification and modeling by taking 
advantage of both the information implicit in multiple-sequence alignments, and the structural information available from 
experimental models. To do this, the structures in the Protein Databank were clustered into sets of proteins with related 
sequences and similar structures, using sequence and structure-based alignment methods to validate the hypothesized 
clusters and to guarantee that the alignments - which will be used for homology searching and modeling - accurately 
represent the structurally conserved cores of protein families. 

This paper introduces the protein family database distributed in version 1998.10 of Chemical Computing Group's 
Molecular Operating Environment, describes the automated protocol used to build the database, and presents the test 
results which demonstrate improvements in the sensitivity of homology searches. 

Methods and Materials 

The raw material of this study included the contents of the September 1998 release of the Brookhaven Protein Databank 
(Bernstein et al), as well as the contents of two public domain sequence databases: Swiss-Prot version 37 (Bairoch and 
Apweiler) and the Protein Identification Resource (FIR) version 57.0 (Barker et al). Only high resolution X-ray models 
from the PDB were used (3.0 Angstroms or higher); chains that were shorter than 25 residues or contained internal chain 
breaks or non-standard amino acids were excluded. The following flowchart summarizes the overall clustering protocol 
that was applied to this data: 



Data pruned at 90% 
mutual identity 



1 



Search PIR and 
Svviss-Prot 




The first step was to perform all-against-all sequence alignments between all the unique protein sequences in the PDB that 
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met the criteria described above. Based on the results of the sequence comparisons, links were created between any two 
chains whenever 90% of the residues of the longer chain were aligned against identical amino acids in the shorter chain. 
Using these links, the chams were clustered on a single-linkage basis - that is, all linked chains were put into the same 
cluster. Each cluster was then represented by one chain m the dataset that was submitted to the full iterative clustering 
procedure. 

These chains were used as queries for sequence similarity searches of the PIR and Swiss-Prot databases. The criteria used 
to return matches from these searches were highly conservative, as there would be no structural information to confirm 
any hypothesized homologies. The purpose of collecting this extra sequence data prior to embarking on the iterative 
clustering was primarily to improve alignment quality though it was also possible that additional links between PDB 
chains might be identified that would otherwise have been missed. 

Once the structure data had been reduced to a (relatively) non-redundant subset, and the PIR and Swiss-Prot data had been 
searched, the iterative phase of the procedure was entered. Each iteration began with a set of clusters. The first iteration 
began with a set of clusters containing one PDB entry each, and possibly augmented by sequences recruited from PIR or 
Swiss-Prot. 

Using MOE-Align, with the sequence-only option enabled, each cluster was aligned against each other cluster, and a Z- 
score recorded. Z-scores were calculated by comparing raw alignment scores to a random distribution generated by 
random permutations of sequences in one or the other of the clusters being aligned. Then, single-linkage clustering was 
performed, where a link was considered to exist between two clusters if the Z-score was high enough. 

This procedure created a set of hypothesized clusters, which were submitted to a validation protocol, which, roughly 
speaking, sought to establish whether or not all of the chains in the proposed cluster were "sufficiently superposable". The 
validation procedure, which involves the use of the MOE tools MOE-Align, MOE-Consensus and MOE-ProSuperpose, is 
summarized by the following chart: 




^ V ' 

MOE-Consensus 
MOE -ProSup erpo se 



MOE-Align was used to applied sequence and structure based alignment to the hypothesized cluster. This procedure is 
discussed in some detail in an earlier JCCG feattire fMultiple Sequence and Structure Ali)gnment in MQE\ but a brief 
summary is as follows: 

1 . Multiple sequence alignment, using tree-based build-up and randomized iterative refinment. 

2. Global-multi-body superposition of the alpha-Carbon traces 
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3. ke-alignment using dynamic programming, where the scoring matrix was derived from the 3D positions of the 
alpha-Carbons after superposition. 

4. Re-superpose using the new alignment. If the RMSD has improved, go to step 3. Otherwise terminate. 

After the alignment stage, the hypothesized cluster would be tested for acceptance. In view of the express purpose of this 
database - which is to create clusters from which accurate alignments and models can be created - the acceptance criteria 
was as follows : 

The maximal set of alignment positions such that the worst pau^vise RMSD, over these positions was less than 3.0 
Angstroms was determined. If this set of alignment positions spanned a sufficient percentage of length of each chain 
(75%), then the cluster was accepted, otherwise it was rejected. If the cluster was rejected, then all subsets with more than 
two members would be tested in the same fashion. 

The iterative clustering procedure was terminated when an iteration failed to produce any new clusters. 

Results and Discussion 

The 2300 chains were distributed into 755 families in the final database. 500 clusters contained only one PDB entry; there 
were 83 entries with more than 3 chains. By way of comparison, the SCOP database (Murzin et al) release 1 .37, based on 
the October 1997 version of the Brookhaven PDB, distributed the proteins into slightly more than 800 "families" grouped 
into over 600 "superfamilies", where families were considered possess clear evolutionary relationships, and superfamilies 
"probable" evolutionary relationship. The clusters in MOE's family database generally corresponded to SCOP's families, 
though some spanned part or all of a superfamily. Some SCOP families were distributed in the MOE database into more 
than one cluster if the members were not adequately superposable. For example, the kmases were distributed into three 
families, with twitchin (PDB entry IKOB) isolated by virtue of divergence form the othe kinases in the C-term region, and 
the insulin dependent kinases, which superpose to an RMSD of about 4.7 Angstroms to the other kinases, also put in their 
own cluster. There were other instances where SCOP families were merged - for example, the trypsins and serine 
proteases can by globally superposed to within 1.4 Angstroms RMSD (see Protein Analysis in MOE: The Serine 
Proteases! 

As one of the express purposes of building this database was to improve the efficiency of remote homology detection, the 
following test was performed. The clusters were examined for instances of chains with less then 25% pairwise sequence 
identity to at least two other members in its cluster. Each such chain was then extracted from its cluster, and any chains of 
higher than 25% percent identity to it were discarded, and the remaining chains were re-aligned. Then, the extracted chain 
was ahgned - usmg sequence information only - against each remaining member of the cluster, and against the cluster as a 
whole, and the maximum Z-scores achieved in the paiwise alignment was compared to the Z-score calculated against the 
cluster. For comparison's sake, same measurements were then made using 25% and 50% percentage identity thresholds. 
The results are summarized in the following table. 



Maximum %id 


Pairwise Z-score 


Cluster Z-score 


Difference 


<25 


7 


10.5 


3 


25-50 


13 


17 


4 


>50 


32 


26 


-6 
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" It is worth noting that the increase in the average Z-score among the more remote homologues resulted in lifting the 
strength of the sequence homology signal well out of the "noise" range. The table below contains some example of the 
boost in Z-scores that were observed. 



Family 


PDB entry 


% identity 


Pairwise Z-score 


Cluster Z-score 


Globin 


iimA 


22 


8.3 


13.5 


Lectin 


ILCL 


25 


9.7 


10.9 


Isocitrate dehydrogenase 


IIDC 


23.6 


12.3 


14.7 



While comprehensive comparison to other standard searching methods has yet been made, there are various examples in 
the database of clusters which include remote homologues which appear not be detected by PSI-BLAST or Pfam. For 
example, consider the ferredoxin oxioreductase represented in the PDB by accession number 1 ASP. 




In MOE's family database, 1 ASP was clustered with a set of five other protein chains. The percentage identities and 
pairwise RMSD values are shown in the tables below. 



1A8P NADPH\: FERREDOXIN OXIDOREDUCTASE 

ICNF OXIDOREDUCTASE (NITROGENOUS ACCEPTOR) 

INDH ELECTRON TRANSPORT (FLAVO PROTEIN) 

IFNB OXIDOREDUCTASE (NADP+ (A) , FERREDOXIN (A) ) 

IQUE FERREDOXIN--NADP+ REDUCTASE 
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1A8P ICNF INDH IFNB IQUE IFDR 



1 


1A8P 


: 100 


0 


15 


.4 


11 


.9 


13. 


2 


12. 


2 


30 


7 


2 


ICNF 


: 15 


6 


100 


0 


35 


.6 


10. 


8 


12. 


2 


13 


9 


3 


INDH 


: 12 


5 


36 


9 


100 


0 


12. 


5 


12. 


9 


16 


8 


A 


IFNB 


: 15. 


2 


12 


3 


13 


7 


100. 


0 


47. 


9 


16 


8 


5 


IQUE 


: 14 . 


4 


14 


2 


14 


4 


49. 


0 


100. 


0 


15. 


2 


6 


IFDR 


: 29. 


2 


13 


1 


15 


2 


13. 


9 


12. 


2 


100. 


0 



Pairwise RMSD 

- lower triangle is pairwise superposition RMSD 

- upper triangle difference between pairwise and global RMSD 







1A8P 


ICNF 


INDH 


IFNB 


IQUE 


IFDR 


1 


1A8P 


: 0.000 


0.000 


0.000 


0.000 


0.000 


0.000 


2 


ICNF 


: 2.896 


0.000 


0.000 


0.000 


0.000 


0.000 


3 


INDH 


: 2.934 


1.570 


0.000 


0.000 


0.000 


0.000 


4 


IFNB 


: 2.733 


2. 975 


3. 139 


0.000 


0.000 


0.000 


5 


IQUE 


: 2.857 


3.039 


3.193 


0.818 


0.000 


0.000 


6 


IFDR 


: 1.724 


2.575 


2.637 


2.714 


2.883 


0.000 



pro_Superpose : global RMSD = 2.660 

These proteins all contain two globular domains: an oxioreductase FAD/NAD binding domain at the C-term, and a 
cytochrome reductase domain at the N-term. When 1 ASP was extracted and submitted as query to PSI-BLAST, only 
IFDR was picked up above the default significance threshold. Pfam version 3.3 (December, 1998) reported homology 
only to the FAD/NAD binding domam, and not to the N-term domain, despite the fact that models of both domains were 
in the database. 

When the sequence of target 62 from the recently concluded structure prediction competition CASP3 was used as a query, 
this family was reported as globally homologous with very large Z-score (over 12). Again, PSI-BLAST and Pfam reported 
homologies only to the C-term. A model of target 62 was submitted to CASP3 based in this family. 

The other models submitted to CASP3 based on alignments to this database were targets 55, 69 and 82. The results will be 
discussed when they become publicly available. With the exception of target 69, the percentage identities of these 
sequences to the templates foimd in the database were on the order of 20%. 
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